Skip to content

feat(orchestrator): make merge-conflict check asynchronous via runway#245

Merged
behinddwalls merged 1 commit into
mainfrom
preetam/mergeconflict-async
Jun 18, 2026
Merged

feat(orchestrator): make merge-conflict check asynchronous via runway#245
behinddwalls merged 1 commit into
mainfrom
preetam/mergeconflict-async

Conversation

@behinddwalls

@behinddwalls behinddwalls commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Why?

The merge-conflict check ran synchronously inside the validate consumer by calling the mergechecker extension inline. A real merge attempt is slow and I/O-heavy, so doing it on the partition lease blocks the pipeline and couples SubmitQueue to the checker's latency. This moves the check to an asynchronous round-trip with runway, modelled on build/buildsignal but across a service boundary.

What?

The pipeline gains validate ⇢ (runway) ⇢ mergeconflictsignal → batch:

  • validate drops the inline mergechecker call and, after its existing dedup + change-metadata + claim work, publishes the full MergeRequest to the runway-owned merge-conflict-checker queue, keyed by the request id as the client-owned correlation id. The id round-trips, so the result correlates straight back to the request (unlike build, whose server-generated id needs a mapping store). The hand-off to runway is retryable.
  • mergeconflictsignal (new) consumes runway's MergeResult off merge-conflict-checker-signal, advances the request to batch when mergeable, or fails it when conflicted.
  • DLQ reconcilers drive the request to Error on dead-letter; the signal DLQ reads the request id straight off the result.

The check is triggered directly from validate, not a standalone mergeconflict trigger stage: that hop only forwarded the request id before publishing to runway, so folding it into validate saves a queue round-trip and removes a stage (its topic key, consumer, and DLQ entry) with no change to the cross-boundary contract.

Crossing the runway boundary is why these payloads carry full data rather than entity IDs; the queue-payload-boundary rule is documented in CLAUDE.md, with the pipeline diagram and stage table updated in workflow.md and the superseded mergechecker validate-path row noted in extension-contract.md. The mergechecker package is left in-tree (unused on the validate path); removing it is a follow-up. Runway's service implementation is out of scope — only its contract (added in #260) is consumed here.

Test Plan

  • bazel build //...
  • bazel test //... --test_tag_filters=-integration,-e2e (56 tests pass)
  • make gazelle clean

Issues

Stack

  1. refactor(api/base): name the message queue option topic_keys #264
  2. docs(rfc): add message queue contract RFC #259
  3. feat(api/runway): add external merge queue contract #260
  4. @ feat(orchestrator): make merge-conflict check asynchronous via runway #245
  5. feat(merge): make merge asynchronous via runway #247

Comment thread doc/rfc/submitqueue/extension-contract.md Outdated
Comment thread submitqueue/core/topickey/topickey.go Outdated
Comment thread submitqueue/orchestrator/controller/mergeconflictsignal/mergeconflictsignal.go Outdated
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch from a51a86b to d565b91 Compare June 16, 2026 03:48
@behinddwalls behinddwalls marked this pull request as ready for review June 16, 2026 15:46
@behinddwalls behinddwalls requested review from a team and sbalabanov as code owners June 16, 2026 15:46
@behinddwalls behinddwalls force-pushed the preetam/runway-contract branch from edf7ab9 to 3e8071d Compare June 16, 2026 17:09
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch from d565b91 to 00ed26c Compare June 16, 2026 17:09
Comment thread doc/rfc/submitqueue/extension-contract.md Outdated
Comment thread doc/rfc/submitqueue/workflow.md Outdated
Comment thread doc/rfc/submitqueue/workflow.md Outdated
Comment thread submitqueue/orchestrator/controller/dlq/mergeconflictsignal.go Outdated
Comment thread submitqueue/orchestrator/controller/mergeconflict/mergeconflict.go Outdated
Comment thread submitqueue/orchestrator/controller/mergeconflict/mergeconflict.go Outdated
@behinddwalls behinddwalls force-pushed the preetam/runway-contract branch from 3e8071d to cd798f6 Compare June 16, 2026 20:12
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch from 00ed26c to 2ee27cf Compare June 16, 2026 20:12
@behinddwalls behinddwalls requested a review from sbalabanov June 16, 2026 20:12
@behinddwalls behinddwalls force-pushed the preetam/runway-contract branch from cd798f6 to 3e8071d Compare June 17, 2026 06:00
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch 2 times, most recently from a970815 to bc59a27 Compare June 17, 2026 06:09
@behinddwalls behinddwalls force-pushed the preetam/runway-contract branch from 3e8071d to 78d66a5 Compare June 17, 2026 06:09
Base automatically changed from preetam/runway-contract to main June 17, 2026 06:15
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch from bc59a27 to d9dc61f Compare June 17, 2026 06:20
@behinddwalls behinddwalls marked this pull request as draft June 17, 2026 16:56
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch from d9dc61f to 7ab64d3 Compare June 17, 2026 17:33
@behinddwalls behinddwalls changed the base branch from main to preetam/messagequeue-runway June 17, 2026 17:33
@behinddwalls behinddwalls force-pushed the preetam/messagequeue-runway branch from 35e8531 to cee7fb5 Compare June 17, 2026 18:00
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch 3 times, most recently from e0d4543 to f8bc7f6 Compare June 17, 2026 23:42
@behinddwalls behinddwalls force-pushed the preetam/messagequeue-runway branch from e4512bb to 7f52edf Compare June 17, 2026 23:42
@behinddwalls behinddwalls marked this pull request as ready for review June 18, 2026 00:08
@behinddwalls behinddwalls changed the title feat(mergeconflict): make merge-conflict check asynchronous via runway feat(orchestrator): make merge-conflict check asynchronous via runway Jun 18, 2026
@behinddwalls behinddwalls force-pushed the preetam/messagequeue-runway branch from 7f52edf to 265fcfe Compare June 18, 2026 04:22
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch 2 times, most recently from b337b8f to 9c029ed Compare June 18, 2026 04:43
@behinddwalls behinddwalls force-pushed the preetam/messagequeue-runway branch from 265fcfe to 802e3a1 Compare June 18, 2026 04:43
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch from 9c029ed to be8fc07 Compare June 18, 2026 17:07
@behinddwalls behinddwalls force-pushed the preetam/messagequeue-runway branch from 802e3a1 to 4316027 Compare June 18, 2026 17:07
behinddwalls added a commit that referenced this pull request Jun 18, 2026
## Summary
### Why?

The message queue topic-binding proto option was named `topics`, which
reads as a concrete wire topic name. It is not — it carries a stable
**logical topic key**. Each implementer maps the key to whatever topic
name its broker/queue requires (subject to that backend's naming
constraints); on our Go side the keys are `consumer.TopicKey` values,
mapped to concrete names through `TopicRegistry`. The name `topics`
invited the wrong mental model.

### What?

Rename the `google.protobuf.MessageOptions` extension `topics` →
`topic_keys` (field number `50001` unchanged, so the wire/extension
layout is identical) and reframe its doc comment to say it carries a
logical key, not a wire name. Regenerated `messagequeue.pb.go`
(`E_Topics` → `E_TopicKeys`).

This is the base of a stack; the runway contract and the contract RFC
that consume the option are updated in the branches stacked on top.

## Test Plan
✅ `make proto` — descriptor field renamed (`name=topics` →
`name=topic_keys`), field number `50001` unchanged
✅ `./tool/bazel build //...`

## Issues


## Stack
1. @ #264
1. #259
1. #260
1. #245
1. #247
@behinddwalls behinddwalls force-pushed the preetam/messagequeue-runway branch from 4316027 to 107f266 Compare June 18, 2026 18:22
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch from be8fc07 to 32c633a Compare June 18, 2026 18:22
behinddwalls added a commit that referenced this pull request Jun 18, 2026
## Summary
### Why?

Queue payloads are Go structs serialized with `encoding/json`, so the
wire shape is defined only by Go source. There is no language-neutral
contract a non-Go client can compile against, no explicit
topic-to-payload binding, and no distinction between a domain's private
wiring and a published cross-domain contract.

### What?

Doc-only — the design of record for message queue contracts
(`doc/rfc/messagequeue-contract.md`):

- **Contract language: Protobuf**, serialized as **protobuf JSON**
(`protojson`) so payloads stay self-describing JSON on the wire. The
`.proto` is the authority and the Go binding is generated from it, so it
cannot drift (no hand-authored struct, no drift test to keep them in
sync).
- **Topic binding:** a custom `topics` proto option (defined in
`api/base/messagequeue`) carries the wire topic names on the message
itself, read back by reflection — not on the publish/consume hot path,
which still resolves topics from a `consumer.TopicKey` via the registry.
- **Location by audience:** external contracts (something outside the
owning domain depends on them) live in `api/{domain}/messagequeue/`;
internal ones in `{domain}/core/messagequeue/`, with the split enforced
by Bazel `visibility`.
- Documents the accepted protojson conventions (snake_case field names,
UPPER_SNAKE enums, int64-as-string) and why JSON Schema, binary proto,
and Avro were rejected.

## Test Plan
- ✅ `make lint` (doc-only)

## Issues


## Stack
1. #264
1. @ #259
1. #260
1. #245
1. #247
@behinddwalls behinddwalls force-pushed the preetam/messagequeue-runway branch from 107f266 to b50d730 Compare June 18, 2026 18:37
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch from 32c633a to 9789d66 Compare June 18, 2026 18:37
@behinddwalls behinddwalls force-pushed the preetam/messagequeue-runway branch from b50d730 to ac249c7 Compare June 18, 2026 22:22
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch from 9789d66 to f87ee0d Compare June 18, 2026 22:22
behinddwalls added a commit that referenced this pull request Jun 18, 2026
## Summary
### Why?

Runway's merge queues are the first cross-domain message queue contract:
a client (potentially non-Go) publishes a merge request and consumes the
result without access to Runway's Go types or storage. Per the RFC
(#259) this needs a language-neutral, proto-defined contract.

### What?

Establishes the message queue contract pattern using Runway's merge
queues as the reference:

- Adds `api/runway/messagequeue/proto/merge.proto` defining
`MergeRequest`/`MergeResult`, reusing the shared `Change` and `Strategy`
types and the `topics` option from `api/base`; generated into `protopb`.
- Payloads are serialized as **protobuf JSON** via thin `protojson`
helpers (`MergeRequestToBytes`/`MergeRequestFromBytes` and the
`MergeResult` counterparts); a `Topics()` reflection helper exposes the
topic binding. Topic keys are co-located with the contract.
- Moves the merge bindings out of `runway/entity` into
`api/runway/messagequeue`.
- A drift test round-trips the payloads and asserts every Runway topic
key is bound to exactly one message via the `topics` option.

## Test Plan
- ✅ `./tool/bazel test //api/runway/messagequeue:messagequeue_test`
- ✅ `./tool/bazel build //...`

## Issues


## Stack
1. #264
1. #259
1. @ #260
1. #245
1. #247
The merge-conflict check ran synchronously inside the `validate` consumer by calling the `mergechecker` extension inline. A real merge attempt is slow and I/O-heavy, so doing it on the partition lease blocks the pipeline and couples SubmitQueue to the checker's latency. This moves the check to an asynchronous round-trip with runway, modelled on `build`/`buildsignal` but across a service boundary.

The pipeline gains `validate ⇢ (runway) ⇢ mergeconflictsignal → batch`:

- `validate` drops the inline `mergechecker` call and, after its existing dedup + change-metadata + claim work, publishes the full `MergeRequest` to the runway-owned `merge-conflict-check` queue, keyed by the request id as the client-owned correlation id. The id round-trips, so the result correlates straight back to the request (unlike `build`, whose server-generated id needs a mapping store). The hand-off to runway is retryable.
- `mergeconflictsignal` (new) consumes runway's `MergeResult` off `merge-conflict-check-signal`, advances the request to `batch` when mergeable, or fails it when conflicted.
- DLQ reconcilers drive the request to `Error` on dead-letter; the signal DLQ reads the request id straight off the result.

The check is triggered directly from `validate`, not a standalone `mergeconflict` trigger stage: that hop only forwarded the request id before publishing to runway, so folding it into `validate` saves a queue round-trip and removes a stage (its topic key, consumer, and DLQ entry) with no change to the cross-boundary contract.

Crossing the runway boundary is why these payloads carry full data rather than entity IDs; the queue-payload-boundary rule is documented in CLAUDE.md, with the pipeline diagram and stage table updated in workflow.md and the superseded `mergechecker` validate-path row noted in extension-contract.md. The `mergechecker` package is left in-tree (unused on the validate path); removing it is a follow-up. Runway's service implementation is out of scope -- only its contract (added in the parent PR) is consumed here.

✅ `bazel build //...`
✅ `bazel test //... --test_tag_filters=-integration,-e2e` (56 tests pass)
✅ `make gazelle` clean
@behinddwalls behinddwalls changed the base branch from preetam/messagequeue-runway to main June 18, 2026 22:44
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch from f87ee0d to 370d4cd Compare June 18, 2026 22:44
@behinddwalls behinddwalls merged commit f3462ad into main Jun 18, 2026
14 of 26 checks passed
@behinddwalls behinddwalls deleted the preetam/mergeconflict-async branch June 18, 2026 22:46
behinddwalls added a commit that referenced this pull request Jun 18, 2026
## Summary
### Why?

The committing merge ran synchronously inside the orchestrator, blocking
the partition lease on a slow, I/O-heavy operation and coupling
SubmitQueue to the merger's latency. This moves the merge to an
asynchronous round-trip with Runway — mirroring the merge-conflict check
(#245), but for the committing merge.

### What?

Adds `batch → merge ⇢ (runway) ⇢ mergesignal → conclude/speculate`:

- `merge` (new) consumes a batch ready to land, builds the full
`MergeRequest` from the batch's member requests (one `MergeStep` per
request in Contains order, carrying each request's change and land
strategy), and publishes it to Runway's `merger` queue keyed by the
batch id as the client-owned correlation id.
- `mergesignal` (new) consumes Runway's `MergeResult` off
`merger-signal`, correlates by the echoed id, and transitions the batch
to `Succeeded` (merged) or `Failed` (could not) — then fans out to
`conclude` (so member requests pick up the outcome) and `speculate` (so
dependents can re-plan). Purely result-driven; no poll loop.
- DLQ reconcilers drive the batch to a terminal state on dead-letter.

Reuses the Runway merge contract from #260 — the same
`MergeRequest`/`MergeResult`, carried on the committing
`merger`/`merger-signal` topics rather than the dry-run check topics.
Runway's service implementation is out of scope.

## Test Plan
- ✅ `./tool/bazel build //...`
- ✅ `./tool/bazel test //... --test_tag_filters=-integration,-e2e` (57
tests pass)

## Issues


## Stack
1. #264
1. #259
1. #260
1. #245
1. @ #247
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants